-
-
Notifications
You must be signed in to change notification settings - Fork 150
feat: add api for preparing demo data #1370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThis change introduces a new demo data ingestion utility, including a Bash script for ingesting, filtering, alerting, and dashboard setup via an HTTP API. The backend adds new HTTP endpoints and handlers to support these operations, with robust error handling and support for multiple server modes. Supporting modules and routes are updated accordingly. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant HTTP_Server
participant BashScript
participant IngestorNode
Client->>HTTP_Server: GET /demodata?action=ingest|filters|alerts|dashboards
HTTP_Server->>HTTP_Server: Parse query params, determine action
alt action == "ingest" and mode == Ingest/All
HTTP_Server->>BashScript: Execute ingest_demo_data.sh (ingest)
BashScript->>HTTP_Server: Ingest demo data via API
else action == "ingest" and mode == Query/Prism
HTTP_Server->>IngestorNode: Forward request to ingestor
IngestorNode->>BashScript: Execute ingest_demo_data.sh (ingest)
else action == "filters" or "alerts" or "dashboards"
HTTP_Server->>BashScript: Execute ingest_demo_data.sh (filters/alerts/dashboards)
end
HTTP_Server-->>Client: 202 Accepted / Error
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (2)
resources/filters_demo_data.sh (1)
38-144
: Address static analysis warnings and improve curl function.The
curl_with_retry
function is well-implemented but has a couple of issues flagged by static analysis:curl_with_retry() { local url="$1" local method="$2" local data="$3" local content_type="${4:-application/json}" local max_retries="${5:-3}" - local base_timeout="${6:-15}" local retry_count=0 # ... rest of function ... # Extract status code and response body separately to avoid masking return values - local status_code if [[ -n "$response" ]]; then - status_code=$(echo "$response" | tail -n1) + local status_code + status_code=$(echo "$response" | tail -n1) local response_body=$(echo "$response" | sed '$d')The
base_timeout
parameter is unused and can be removed since you're using a fixed timeout calculation pattern.resources/ingest_demo_data.sh (1)
43-149
: Same static analysis issues as filters script - fix unused variable and declaration pattern.The
curl_with_retry
function has identical issues to the filters script:curl_with_retry() { local url="$1" local method="$2" local data="$3" local content_type="${4:-application/json}" local max_retries="${5:-3}" - local base_timeout="${6:-15}" local retry_count=0 # ... rest of function ... - local status_code if [[ -n "$response" ]]; then - status_code=$(echo "$response" | tail -n1) + local status_code + status_code=$(echo "$response" | tail -n1) local response_body=$(echo "$response" | sed '$d')Consider extracting this function to a shared utility script to avoid duplication.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
resources/filters_demo_data.sh
(1 hunks)resources/ingest_demo_data.sh
(1 hunks)src/handlers/http/cluster/mod.rs
(2 hunks)src/handlers/http/demo_data.rs
(1 hunks)src/handlers/http/ingest.rs
(2 hunks)src/handlers/http/mod.rs
(1 hunks)src/handlers/http/modal/ingest_server.rs
(1 hunks)src/handlers/http/modal/query_server.rs
(1 hunks)src/handlers/http/modal/server.rs
(3 hunks)
🧰 Additional context used
🧠 Learnings (2)
src/handlers/http/demo_data.rs (1)
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1346
File: src/parseable/streams.rs:319-331
Timestamp: 2025-06-16T09:50:38.636Z
Learning: In Parseable's Ingest or Query mode, the node_id is always available because it's generated during server initialization itself, before the get_node_id_string() function in streams.rs would be called. This makes the .expect() calls on QUERIER_META.get() and INGESTOR_META.get() safe in this context.
src/handlers/http/cluster/mod.rs (2)
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1346
File: src/parseable/streams.rs:319-331
Timestamp: 2025-06-16T09:50:38.636Z
Learning: In Parseable's Ingest or Query mode, the node_id is always available because it's generated during server initialization itself, before the get_node_id_string() function in streams.rs would be called. This makes the .expect() calls on QUERIER_META.get() and INGESTOR_META.get() safe in this context.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1346
File: src/handlers/http/health_check.rs:81-90
Timestamp: 2025-06-16T02:04:58.990Z
Learning: In the shutdown function in src/handlers/http/health_check.rs, the design approach is to log errors from sync operations rather than propagate them. This is intentional because the shutdown function is called on SIGTERM/SIGINT signals, and the goal is to perform best-effort cleanup (syncing pending files to object storage) while allowing the shutdown to proceed regardless of sync failures. Logging provides debugging information without blocking the shutdown process.
🧬 Code Graph Analysis (2)
src/handlers/http/modal/ingest_server.rs (1)
src/handlers/http/modal/server.rs (2)
get_readiness_factory
(692-696)get_demo_data_webscope
(206-208)
src/handlers/http/modal/query_server.rs (1)
src/handlers/http/modal/server.rs (1)
get_demo_data_webscope
(206-208)
🪛 Shellcheck (0.10.0)
resources/filters_demo_data.sh
[warning] 44-44: base_timeout appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 100-100: Declare and assign separately to avoid masking return values.
(SC2155)
resources/ingest_demo_data.sh
[warning] 49-49: base_timeout appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 105-105: Declare and assign separately to avoid masking return values.
(SC2155)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: coverage
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: Build Default x86_64-pc-windows-msvc
🔇 Additional comments (15)
src/handlers/http/mod.rs (1)
37-37
: LGTM! Module declaration follows existing patterns.The addition of the
demo_data
module declaration is consistent with the existing module organization and properly integrates the new functionality.src/handlers/http/modal/query_server.rs (1)
78-79
: Consistent service registration for query server.The demo data web scope service is properly added to the query server configuration, maintaining the existing service chain pattern.
src/handlers/http/modal/ingest_server.rs (1)
81-82
: Excellent consistency across server configurations.The demo data service registration in the ingest server matches the pattern used in the query server, ensuring consistent API availability across different server modes.
src/handlers/http/modal/server.rs (3)
26-26
: Import statement is correctly placed.The import of
get_demo_data
from the demo_data module is properly placed with other handler imports.
101-102
: Service registration maintains consistency.The demo data web scope service is appropriately added to the main server configuration, consistent with the pattern used in query and ingest servers.
206-208
: Web scope definition aligns with API requirements.The
get_demo_data_webscope()
function correctly creates a scope at/demodata
with a GET route, matching the API specification in the PR objectives.src/handlers/http/ingest.rs (2)
496-499
: Well-designed error variants for parameter validation.The new
InvalidQueryParameter
andMissingQueryParameter
error variants provide clear, descriptive error handling for query parameter validation scenarios. The error messages are user-friendly and appropriate.
529-530
: Correct HTTP status code mapping.Both new error variants appropriately map to
BAD_REQUEST
(400) status code, which is the correct HTTP response for client-side parameter validation errors.src/handlers/http/cluster/mod.rs (2)
159-213
: LGTM! Well-implemented ingestor forwarding function.The function correctly:
- Validates ingestor availability through liveness checks
- Uses the first available ingestor for load distribution
- Handles authentication and proper URL construction
- Provides comprehensive error handling
The design choice to break after finding the first live ingestor is appropriate for this use case.
978-978
: Good generalization of the pattern match.Changing from
Ok(None)
toOk(_)
appropriately handles all successful cases that don't contain metrics without being overly specific.resources/filters_demo_data.sh (2)
167-216
: Well-structured filter creation with good error handling.The SQL filter creation logic is solid:
- Good variety of realistic filter queries
- Proper JSON escaping for quotes
- Retry logic with appropriate delays
- Clear progress reporting
The filter definitions cover useful monitoring scenarios and will provide good demo data for users.
218-271
: Comprehensive saved filter implementation.The saved filter creation includes proper:
- Visible column configuration
- Group-by field specification
- JSON array construction for columns
- Consistent error handling pattern
The filters complement the SQL filters well and provide a complete demo experience.
resources/ingest_demo_data.sh (3)
161-189
: Excellent performance optimization with static data pre-computation.The approach of pre-computing all random data once and cycling through arrays is a smart optimization that will significantly improve ingestion performance:
- Eliminates runtime random number generation overhead
- Provides consistent, predictable data patterns
- Reduces CPU usage during batch generation
- Maintains data variety through array cycling
This is a well-engineered solution for high-throughput demo data generation.
192-263
: Robust batch generation with good validation.The
generate_batch
function includes:
- Input validation for batch size and array initialization
- Efficient array cycling with modulo operations
- Proper JSON escaping for user agent strings
- Severity mapping based on HTTP status codes
- Size validation of generated output
The single-line JSON generation approach is efficient while maintaining readability.
312-377
: Well-implemented main execution loop with comprehensive monitoring.The execution flow provides:
- Progress tracking with rate calculations
- Proper error handling and early exit on failures
- Batch size adjustment for remaining records
- Performance statistics and optimization details
- Appropriate delays to prevent server overload
The final performance tips output is a nice touch for users understanding the optimizations.
ecbf80d
to
a7387c1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (3)
src/handlers/http/demo_data.rs (2)
33-90
: LGTM with remaining security considerations.The handler implementation correctly validates query parameters, uses appropriate async execution patterns, and returns HTTP 202 for fire-and-forget operations. However, there are still security concerns that need to be addressed.
The implementation addresses most requirements but still has the credential exposure issue mentioned in previous reviews. Consider implementing credential passing via secure temporary files instead of environment variables.
92-151
: Improve security by avoiding credential exposure via environment variables.The script execution function has good error handling and temporary file management, but still passes sensitive credentials through environment variables (lines 125-127), which can be visible in process lists.
Consider this more secure approach for passing credentials:
async fn execute_demo_script( action: &str, url: &str, username: &str, password: &str, ) -> Result<(), anyhow::Error> { + // Create secure temporary file for credentials + let cred_file = tempfile::NamedTempFile::new() + .map_err(|e| anyhow::anyhow!("Failed to create credentials file: {}", e))?; + + let cred_content = format!("P_URL={}\nP_USERNAME={}\nP_PASSWORD={}", url, username, password); + fs::write(cred_file.path(), cred_content) + .map_err(|e| anyhow::anyhow!("Failed to write credentials: {}", e))?; + + #[cfg(unix)] + { + let mut permissions = fs::metadata(cred_file.path())?.permissions(); + permissions.set_mode(0o600); // Read/write for owner only + fs::set_permissions(cred_file.path(), permissions)?; + } // ... existing temp file creation code ... let output = Command::new("bash") .arg(temp_path) - .env("P_URL", url) - .env("P_USERNAME", username) - .env("P_PASSWORD", password) + .env("CRED_FILE", cred_file.path()) .env("DEMO_ACTION", action) .output() // ... rest unchangedThen update the scripts to source credentials from the secure file instead of environment variables.
resources/ingest_demo_data.sh (1)
15-95
: Duplicate code detected - consider shared utilities.This
curl_with_retry
function is identical to the one inresources/filters_demo_data.sh
. Consider extracting it to a shared utility script to follow DRY principles.Refer to the refactoring suggestion in the filters script review for creating a shared
common_utils.sh
file.
🧹 Nitpick comments (2)
resources/filters_demo_data.sh (1)
13-93
: Consider extracting shared utility functions.The
curl_with_retry
function is duplicated between this script andresources/ingest_demo_data.sh
. Consider creating a shared utility script to reduce code duplication.Create a shared utility file
resources/common_utils.sh
:#!/usr/bin/env bash # Common utility functions for demo scripts curl_with_retry() { # ... move the current implementation here } # Other shared functionsThen source it in both scripts:
#!/usr/bin/env bash +source "$(dirname "$0")/common_utils.sh" # Configuration P_URL=${P_URL:-"http://localhost:8000"} # ... rest of scriptresources/ingest_demo_data.sh (1)
208-210
: Consider utilizing computed metrics for monitoring.The static analysis correctly identifies that
progress
,rate
, andFINAL_RATE
variables are computed but not used. While they're not currently utilized, they provide valuable monitoring data.Consider adding optional verbose output to utilize these metrics:
# Progress tracking progress=$((RECORDS_SENT * 100 / TARGET_RECORDS)) elapsed=$(($(date +%s) - START_TIME)) rate=$((RECORDS_SENT / (elapsed == 0 ? 1 : elapsed))) +# Optional progress output (uncomment for verbose mode) +# echo "Progress: ${progress}% (${RECORDS_SENT}/${TARGET_RECORDS}) - Rate: ${rate} records/sec"And at the end:
# Final statistics TOTAL_TIME=$(($(date +%s) - START_TIME)) FINAL_RATE=$((TARGET_RECORDS / (TOTAL_TIME == 0 ? 1 : TOTAL_TIME))) +# echo "Ingestion completed: ${TARGET_RECORDS} records in ${TOTAL_TIME}s (${FINAL_RATE} records/sec)"Also applies to: 230-230
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lock
is excluded by!**/*.lock
📒 Files selected for processing (10)
Cargo.toml
(1 hunks)resources/filters_demo_data.sh
(1 hunks)resources/ingest_demo_data.sh
(1 hunks)src/handlers/http/cluster/mod.rs
(2 hunks)src/handlers/http/demo_data.rs
(1 hunks)src/handlers/http/ingest.rs
(2 hunks)src/handlers/http/mod.rs
(1 hunks)src/handlers/http/modal/ingest_server.rs
(1 hunks)src/handlers/http/modal/query_server.rs
(1 hunks)src/handlers/http/modal/server.rs
(3 hunks)
✅ Files skipped from review due to trivial changes (2)
- Cargo.toml
- src/handlers/http/modal/query_server.rs
🚧 Files skipped from review as they are similar to previous changes (5)
- src/handlers/http/modal/ingest_server.rs
- src/handlers/http/mod.rs
- src/handlers/http/ingest.rs
- src/handlers/http/modal/server.rs
- src/handlers/http/cluster/mod.rs
🧰 Additional context used
🧠 Learnings (1)
src/handlers/http/demo_data.rs (1)
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1346
File: src/parseable/streams.rs:319-331
Timestamp: 2025-06-16T09:50:38.636Z
Learning: In Parseable's Ingest or Query mode, the node_id is always available because it's generated during server initialization itself, before the get_node_id_string() function in streams.rs would be called. This makes the .expect() calls on QUERIER_META.get() and INGESTOR_META.get() safe in this context.
🧬 Code Graph Analysis (1)
resources/ingest_demo_data.sh (1)
resources/filters_demo_data.sh (1)
curl_with_retry
(13-93)
🪛 Shellcheck (0.10.0)
resources/ingest_demo_data.sh
[warning] 208-208: progress appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 210-210: rate appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 230-230: FINAL_RATE appears unused. Verify use (or export if used externally).
(SC2034)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Build Kafka aarch64-apple-darwin
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: coverage
🔇 Additional comments (4)
resources/filters_demo_data.sh (2)
13-93
: Excellent retry logic implementation.The
curl_with_retry
function has robust error handling, timeout management, and proper cleanup of temporary files. The implementation correctly handles different HTTP status codes and provides incremental timeout increases on retries.
96-176
: Well-structured filter creation with proper JSON handling.Both
create_sql_filters
andcreate_saved_filters
functions demonstrate good practices:
- Proper JSON escaping for user input
- Efficient array-based configuration
- Rate limiting between requests
- Error counting for monitoring
resources/ingest_demo_data.sh (2)
104-111
: Efficient synthetic data generation approach.The pre-computation of static arrays for cycling is an excellent optimization that avoids repeated random generation during batch creation. The use of realistic data patterns (IP addresses, timestamps, trace IDs) makes the demo data valuable for testing.
122-186
: Robust batch generation with proper JSON construction.The
generate_batch
function demonstrates good practices:
- Proper modulo cycling through pre-computed arrays
- JSON escaping for user agents
- Realistic log entry structure with all required fields
- Severity mapping based on HTTP status codes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
resources/ingest_demo_data.sh (1)
72-74
: Address static analysis warning about variable assignment.The static analysis tool correctly identified a potential issue where the assignment and command substitution could mask return values.
Apply this diff to address the warning:
- if [[ -n "$response" ]]; then - status_code=$(echo "$response" | tail -n1) - local response_body=$(echo "$response" | sed '$d') + if [[ -n "$response" ]]; then + local status_code + status_code=$(echo "$response" | tail -n1) + local response_body + response_body=$(echo "$response" | sed '$d')
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
resources/ingest_demo_data.sh
(1 hunks)src/handlers/http/demo_data.rs
(1 hunks)
🧰 Additional context used
🪛 Shellcheck (0.10.0)
resources/ingest_demo_data.sh
[warning] 74-74: Declare and assign separately to avoid masking return values.
(SC2155)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: coverage
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Kafka aarch64-apple-darwin
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
🔇 Additional comments (10)
src/handlers/http/demo_data.rs (3)
33-45
: Query parameter validation looks good.The implementation correctly validates query parameters, handles empty query maps, and extracts the required action parameter with proper error handling.
52-82
: Action routing and mode handling implementation is well-structured.The action-based routing correctly handles different modes:
- Ingest/All modes execute scripts locally
- Query/Prism modes forward to ingestor services
- Proper error handling for unsupported modes
The asynchronous execution pattern with fire-and-forget for script execution is appropriate for this use case.
85-139
: Script execution implementation addresses previous security concerns.The implementation now uses temporary files instead of constructing script paths, which addresses the directory traversal concerns from previous reviews. The error handling is comprehensive, covering file creation, permission setting, script execution, and exit status validation.
The approach of writing the embedded script to a temporary file and making it executable is secure and appropriate.
resources/ingest_demo_data.sh (7)
1-14
: Configuration setup is well-structured.The script provides sensible defaults for all configuration variables and uses proper environment variable handling.
15-105
: Robust HTTP retry logic with comprehensive error handling.The
curl_with_retry
function implements:
- Proper timeout escalation
- Temporary file management for large payloads
- HTTP status code validation
- Clean resource cleanup
The implementation handles both string data and file-based data appropriately.
109-200
: Efficient data generation with pre-computed arrays.The ingestion functionality uses smart optimizations:
- Pre-computed arrays for cycling through data
- Batch generation with realistic log patterns
- Proper JSON escaping and formatting
The approach of generating 100 pre-computed values and cycling through them is efficient for demo data generation.
214-255
: Well-implemented ingestion loop with progress tracking.The main ingestion function provides:
- Progress reporting with rate calculations
- Proper batch size handling for remaining records
- Error handling with early exit on failures
- Final statistics reporting
The implementation correctly handles edge cases like remaining records less than batch size.
259-356
: Comprehensive filter creation with proper JSON handling.The filters functionality creates both SQL and saved filters with:
- Proper JSON escaping for queries and descriptions
- Comprehensive set of useful demo filters
- Success tracking and error reporting
- Appropriate delays between requests
The filter definitions are well-thought-out and provide good examples for demo purposes.
360-456
: Alert creation with proper target management.The alerts functionality:
- Creates webhook targets and extracts IDs correctly
- Defines realistic alert conditions
- Implements proper error handling and response validation
- Uses the extracted target ID for alert association
The alert definitions are practical and demonstrate the alerting capabilities well.
484-520
: Clean main execution logic with proper action handling.The main function provides:
- Clear action-based routing
- Sequential execution for "all" action with appropriate delays
- Help functionality
- Proper error handling for unknown actions
The script structure is well-organized and user-friendly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (1)
resources/ingest_demo_data.sh (1)
79-82
: SC2155: avoid masking return valuesDeclaring and assigning in the same statement can mask errors. Split the declaration:
- status_code=$(echo "$response" | tail -n1) - local response_body=$(echo "$response" | sed '$d') + local status_code + status_code=$(echo "$response" | tail -n1) + local response_body + response_body=$(echo "$response" | sed '$d')Minor, but eliminates the ShellCheck warning and improves readability.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
resources/ingest_demo_data.sh
(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
resources/ingest_demo_data.sh (1)
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1305
File: src/handlers/http/users/dashboards.rs:125-148
Timestamp: 2025-05-01T10:33:51.767Z
Learning: When adding a tile to a dashboard in `add_tile()` function, the tile ID must be provided by the client and should not be generated by the server. If the tile ID is missing (nil), the API should fail the operation with an appropriate error message.
🪛 Shellcheck (0.10.0)
resources/ingest_demo_data.sh
[warning] 81-81: Declare and assign separately to avoid masking return values.
(SC2155)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Build Kafka aarch64-apple-darwin
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: coverage
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
🔇 Additional comments (1)
resources/ingest_demo_data.sh (1)
210-218
:send_batch
passes a superfluous argument
curl_with_retry
(current definition) ignores the 6th positional parameter, so the hard-coded15
here is dead code and confuses future maintainers:- curl_with_retry "$P_URL/api/v1/ingest" "POST" "$data" "application/json" 3 15 + curl_with_retry "$P_URL/api/v1/ingest" "POST" "$data" "application/json" 3After harmonising the function signature (see earlier comment), adjust all similar calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (3)
resources/ingest_demo_data.sh (3)
16-24
:curl_with_retry
interface is still under-specifiedThe 6-th positional parameter (
data_file
) is now consumed (local data_file="$6"
), but neither the header comments nor the call-sites give any hint that such a parameter exists. Every call passes five arguments at most, which is fine, yet the silent extra slot makes misuse easy.-# Common curl function with retry logic -curl_with_retry() { +# Common curl wrapper +# $1 url (required) +# $2 method (POST|PUT) (required) +# $3 data string (optional) +# $4 content-type (default: application/json) +# $5 max retries (default: 3) +# $6 data file path (optional) +curl_with_retry() {Consider:
- Documenting the full signature as above.
- Guarding against callers that supply both
$3
and$6
.- Using named parameters (e.g.
curl_with_retry --url … --data-file …
) to make misuse impossible.
77-96
: HTTP-code extraction drops the body when it mattersYou rely on appending a line break and
%{http_code}
then stripping it back withtail -n1
/sed '$d'
.
Problems:
- Bodies that naturally end with a newline will shift the index (status code becomes empty).
- Binary responses (even small PNGs from dashboards) may embed newlines, breaking the split.
Prefer the built-in separation:
tmp_body=$(mktemp) status_code=$(curl "${curl_args[@]}" -o "$tmp_body" -w '%{http_code}') curl_exit_code=$? # … use "$status_code" and "$(cat "$tmp_body")" rm -f "$tmp_body"This is more robust and removes the need for
sed
andtail
.
831-840
:show_usage
implies positional argument, but the script uses$ACTION
env varThe help text suggests
script.sh ingest
, yetmain()
never looks at$1
; it only inspects theACTION
environment variable. Update the usage string (or parse$1
) to avoid user confusion.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
resources/ingest_demo_data.sh
(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
resources/ingest_demo_data.sh (4)
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1370
File: resources/ingest_demo_data.sh:125-131
Timestamp: 2025-07-14T02:11:24.559Z
Learning: In the Parseable demo data script, the timestamp generation using `date -u +%Y-%m-%dT%H:%M:%S.%03dZ` works correctly and produces valid ISO timestamps with milliseconds, despite initial analysis suggesting %03d might not be a valid placeholder. The user verified this produces timestamps like "2025-07-13T16:53:15.013Z" in practice.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1346
File: src/parseable/streams.rs:351-355
Timestamp: 2025-06-16T05:20:18.593Z
Learning: In the Parseable codebase, arrow files are expected to always have valid creation or modified timestamps as a basic system assumption. The conversion flow uses expect() on file metadata operations to enforce this invariant with fail-fast behavior rather than graceful error handling, as violations represent fundamental system issues that should cause immediate failures.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1346
File: src/storage/object_storage.rs:838-840
Timestamp: 2025-06-16T02:09:49.200Z
Learning: In Parseable server, filenames for arrow files are generated server-side during ingestion using a strict format pattern: `{stream_hash}.date={date}.hour={hour}.minute={minute}.{custom_partitions}{hostname}.data.{extension}`. This guarantees that when splitting by dots, the second element will always be `date={value}`, making direct indexing safe when extracting the date part by splitting on '=' and accessing index 1.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1305
File: src/handlers/http/users/dashboards.rs:125-148
Timestamp: 2025-05-01T10:33:51.767Z
Learning: When adding a tile to a dashboard in `add_tile()` function, the tile ID must be provided by the client and should not be generated by the server. If the tile ID is missing (nil), the API should fail the operation with an appropriate error message.
🪛 Shellcheck (0.10.0)
resources/ingest_demo_data.sh
[warning] 81-81: Declare and assign separately to avoid masking return values.
(SC2155)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: Build Kafka aarch64-apple-darwin
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: coverage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (1)
resources/ingest_demo_data.sh (1)
129-134
: Use a stronger entropy source for IDs.
$((RANDOM * RANDOM))
yields at most a 30-bit value, so the 32-hex-digit
trace-/span-IDs are heavily zero-padded and far from unique at 10 000
records. Prefer/dev/urandom
oruuidgen
for realistic IDs.- TRACE_IDS+=("$(printf '%032x' $((RANDOM * RANDOM)))") - SPAN_IDS+=("$(printf '%016x' $((RANDOM * RANDOM)))") + TRACE_IDS+=("$(uuidgen | tr -d '-')") # 32-hex chars + SPAN_IDS+=("$(xxd -l 8 -p /dev/urandom)") # 16-hex chars
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
resources/ingest_demo_data.sh
(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
resources/ingest_demo_data.sh (4)
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1370
File: resources/ingest_demo_data.sh:125-131
Timestamp: 2025-07-14T02:11:24.559Z
Learning: In the Parseable demo data script, the timestamp generation using `date -u +%Y-%m-%dT%H:%M:%S.%03dZ` works correctly and produces valid ISO timestamps with milliseconds, despite initial analysis suggesting %03d might not be a valid placeholder. The user verified this produces timestamps like "2025-07-13T16:53:15.013Z" in practice.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1346
File: src/parseable/streams.rs:351-355
Timestamp: 2025-06-16T05:20:18.593Z
Learning: In the Parseable codebase, arrow files are expected to always have valid creation or modified timestamps as a basic system assumption. The conversion flow uses expect() on file metadata operations to enforce this invariant with fail-fast behavior rather than graceful error handling, as violations represent fundamental system issues that should cause immediate failures.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1346
File: src/storage/object_storage.rs:838-840
Timestamp: 2025-06-16T02:09:49.200Z
Learning: In Parseable server, filenames for arrow files are generated server-side during ingestion using a strict format pattern: `{stream_hash}.date={date}.hour={hour}.minute={minute}.{custom_partitions}{hostname}.data.{extension}`. This guarantees that when splitting by dots, the second element will always be `date={value}`, making direct indexing safe when extracting the date part by splitting on '=' and accessing index 1.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1305
File: src/handlers/http/users/dashboards.rs:125-148
Timestamp: 2025-05-01T10:33:51.767Z
Learning: When adding a tile to a dashboard in `add_tile()` function, the tile ID must be provided by the client and should not be generated by the server. If the tile ID is missing (nil), the API should fail the operation with an appropriate error message.
🪛 Shellcheck (0.10.0)
resources/ingest_demo_data.sh
[warning] 84-84: Declare and assign separately to avoid masking return values.
(SC2155)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: coverage
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Build Kafka aarch64-apple-darwin
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Default x86_64-apple-darwin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (3)
resources/ingest_demo_data.sh (3)
128-135
:UNIX_NANOS
lacks entropy – many records share identical seconds component
date +%s
is evaluated once per loop so every 100 generated records have the same epoch-seconds; only the nanosecond suffix varies. For large ingests this causes artificial “spiky” time-series clusters.- UNIX_NANOS+=("$(date +%s)$(printf '%09d' $((RANDOM % 1000000000)))") + now_s=$(date +%s) + UNIX_NANOS+=("$now_s$(printf '%09d' $((RANDOM % 1000000000)))")(Or offset
now_s
by the same random seconds you already apply to the RFC-3339 timestamp to keep the two fields consistent.)
195-199
: Manual JSON escaping is brittle – usejq -R @json
insteadThe current two-stage
sed
chain misses back-slashes, control chars, etc. A safer, one-liner escape:- escaped_user_agent=$(echo "$user_agent" | sed 's/"/\\"/g' | sed "s/'/\\'/g") + escaped_user_agent=$(printf '%s' "$user_agent" | jq -Rsa @json | sed -e 's/^"//' -e 's/"$//')This guarantees valid UTF-8 / JSON for any header value.
41-48
: Nit:$AUTH_HEADER
is constant – switch to-H@"<(printf …)"
to avoid arg-length limitsWith long Basic-Auth tokens the fixed
-H "$AUTH_HEADER"
may hit argv size limits on macOS/BSD. Using a process-substitution file avoids quoting issues and is more portable:curl_args+=( -H@"<(printf 'Authorization: Basic %s' "$BASIC")" )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
resources/ingest_demo_data.sh
(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
resources/ingest_demo_data.sh (4)
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1370
File: resources/ingest_demo_data.sh:125-131
Timestamp: 2025-07-14T02:11:24.559Z
Learning: In the Parseable demo data script, the timestamp generation using `date -u +%Y-%m-%dT%H:%M:%S.%03dZ` works correctly and produces valid ISO timestamps with milliseconds, despite initial analysis suggesting %03d might not be a valid placeholder. The user verified this produces timestamps like "2025-07-13T16:53:15.013Z" in practice.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1346
File: src/parseable/streams.rs:351-355
Timestamp: 2025-06-16T05:20:18.593Z
Learning: In the Parseable codebase, arrow files are expected to always have valid creation or modified timestamps as a basic system assumption. The conversion flow uses expect() on file metadata operations to enforce this invariant with fail-fast behavior rather than graceful error handling, as violations represent fundamental system issues that should cause immediate failures.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1346
File: src/storage/object_storage.rs:838-840
Timestamp: 2025-06-16T02:09:49.200Z
Learning: In Parseable server, filenames for arrow files are generated server-side during ingestion using a strict format pattern: `{stream_hash}.date={date}.hour={hour}.minute={minute}.{custom_partitions}{hostname}.data.{extension}`. This guarantees that when splitting by dots, the second element will always be `date={value}`, making direct indexing safe when extracting the date part by splitting on '=' and accessing index 1.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1305
File: src/handlers/http/users/dashboards.rs:125-148
Timestamp: 2025-05-01T10:33:51.767Z
Learning: When adding a tile to a dashboard in `add_tile()` function, the tile ID must be provided by the client and should not be generated by the server. If the tile ID is missing (nil), the API should fail the operation with an appropriate error message.
🪛 Shellcheck (0.10.0)
resources/ingest_demo_data.sh
[warning] 85-85: Declare and assign separately to avoid masking return values.
(SC2155)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: Build Kafka aarch64-apple-darwin
- GitHub Check: coverage
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (1)
resources/ingest_demo_data.sh (1)
41-44
: Duplicate-w
option – first instance is silently ignored
curl_args
adds-w '\n%{http_code}'
(line 43) and later adds another
-w "%{http_code}"
(line 82). Only the last-w
is honored, so the earlier
one is dead code and misleading.- -w '\n%{http_code}'
Also applies to: 82-83
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
resources/ingest_demo_data.sh
(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
resources/ingest_demo_data.sh (4)
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1370
File: resources/ingest_demo_data.sh:125-131
Timestamp: 2025-07-14T02:11:24.559Z
Learning: In the Parseable demo data script, the timestamp generation using `date -u +%Y-%m-%dT%H:%M:%S.%03dZ` works correctly and produces valid ISO timestamps with milliseconds, despite initial analysis suggesting %03d might not be a valid placeholder. The user verified this produces timestamps like "2025-07-13T16:53:15.013Z" in practice.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1346
File: src/parseable/streams.rs:351-355
Timestamp: 2025-06-16T05:20:18.593Z
Learning: In the Parseable codebase, arrow files are expected to always have valid creation or modified timestamps as a basic system assumption. The conversion flow uses expect() on file metadata operations to enforce this invariant with fail-fast behavior rather than graceful error handling, as violations represent fundamental system issues that should cause immediate failures.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1346
File: src/storage/object_storage.rs:838-840
Timestamp: 2025-06-16T02:09:49.200Z
Learning: In Parseable server, filenames for arrow files are generated server-side during ingestion using a strict format pattern: `{stream_hash}.date={date}.hour={hour}.minute={minute}.{custom_partitions}{hostname}.data.{extension}`. This guarantees that when splitting by dots, the second element will always be `date={value}`, making direct indexing safe when extracting the date part by splitting on '=' and accessing index 1.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1305
File: src/handlers/http/users/dashboards.rs:125-148
Timestamp: 2025-05-01T10:33:51.767Z
Learning: When adding a tile to a dashboard in `add_tile()` function, the tile ID must be provided by the client and should not be generated by the server. If the tile ID is missing (nil), the API should fail the operation with an appropriate error message.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: coverage
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: Build Kafka aarch64-apple-darwin
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
3953f3c
to
39077f1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (3)
resources/ingest_demo_data.sh (3)
391-404
: Inconsistent use of retry / timeout logic for target creation
create_target
falls back to a barecurl -s …
call.
Unlike the rest of the script it:
- lacks retry/back-off on transient failures,
- ignores connect/read time-outs,
- omits 2xx status-code validation.
Replacing it with
curl_with_retry
keeps error-handling behaviour uniform.- response=$(curl -s -H "Content-Type: application/json" -H "$AUTH_HEADER" -X POST "$P_URL/api/v1/targets" -d @- << EOF + response=$(curl_with_retry "$P_URL/api/v1/targets" "POST" @- "application/json" 3 << EOF(You may need to extend
curl_with_retry
to support stdin payloads, or write the JSON to a temp file and pass its path.)
211-215
: Manual JSON string escaping is fragileUsing
sed
to escape quotes inuser_agent
(and later for SQL/filter JSON) risks subtle breakage for other special characters (backslashes, control bytes, UTF-8).
A safer pattern is to delegate JSON encoding to a tool such asjq
:json_record=$(jq -n --arg body "$body" \ --arg time "$unix_nano" \ --arg ua "$user_agent" \ '{ body: $body, time_unix_nano: $time, user_agent_original: $ua # … other fields … }')This prevents malformed payloads if input ever contains unexpected characters.
145-151
:date
invoked 200× inside the outer loop – precompute once
TIMESTAMPS
andUNIX_NANOS
are generated with separatedate
invocations for each of the 100 iterations. While acceptable for 10 k records, the cost grows quadratically if you raiseTARGET_RECORDS
later.Consider computing
base_now=$(date +%s)
once and deriving offsets with arithmetic (printf -v ts … $((base_now-rand))
) to remove the per-iteration process spawn.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
resources/ingest_demo_data.sh
(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
resources/ingest_demo_data.sh (4)
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1370
File: resources/ingest_demo_data.sh:125-131
Timestamp: 2025-07-14T02:11:24.559Z
Learning: In the Parseable demo data script, the timestamp generation using `date -u +%Y-%m-%dT%H:%M:%S.%03dZ` works correctly and produces valid ISO timestamps with milliseconds, despite initial analysis suggesting %03d might not be a valid placeholder. The user verified this produces timestamps like "2025-07-13T16:53:15.013Z" in practice.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1346
File: src/parseable/streams.rs:351-355
Timestamp: 2025-06-16T05:20:18.593Z
Learning: In the Parseable codebase, arrow files are expected to always have valid creation or modified timestamps as a basic system assumption. The conversion flow uses expect() on file metadata operations to enforce this invariant with fail-fast behavior rather than graceful error handling, as violations represent fundamental system issues that should cause immediate failures.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1346
File: src/storage/object_storage.rs:838-840
Timestamp: 2025-06-16T02:09:49.200Z
Learning: In Parseable server, filenames for arrow files are generated server-side during ingestion using a strict format pattern: `{stream_hash}.date={date}.hour={hour}.minute={minute}.{custom_partitions}{hostname}.data.{extension}`. This guarantees that when splitting by dots, the second element will always be `date={value}`, making direct indexing safe when extracting the date part by splitting on '=' and accessing index 1.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1370
File: resources/ingest_demo_data.sh:12-14
Timestamp: 2025-07-14T11:29:22.509Z
Learning: In the Parseable codebase, the base64 encoding for Basic Auth headers in resources/ingest_demo_data.sh works correctly as implemented without needing to strip trailing newlines. The user has validated this works with their server environment.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: Build Kafka aarch64-apple-darwin
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: coverage
API GET /api/v1/demodata?action=ingest to ingest 10k records from the script GET /api/v1/demodata?action=filters to create saved sqls and saved filters from the script
39077f1
to
b0b5b2c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (3)
resources/ingest_demo_data.sh (3)
1-2
: Enable strict Bash modes for more predictable failure handlingAdding
set -euo pipefail
(or at leastset -e
andset -o pipefail
) right after the she-bang ensures the script aborts on the first failed command, on undefined variables, and on pipeline errors – preventing silent mis-ingestion when e.g.curl
fails inside a subshell.#!/usr/bin/env bash +set -euo pipefail
20-33
:curl_with_retry
– document/validate the 6-argument contractThe function now expects an optional sixth
data_file
argument, but call-sites (e.g.send_batch
,create_alerts
) still pass five. That’s fine, yet without a comment or anif (( $# < 5 || $# > 6 ))
guard, future maintainers may unintentionally pass seven+ params and silently shift$data_file
.Consider adding a short doc-string or an argument-count check to make the interface self-evident.
397-405
: Gracefully handle missingjq
dependencyThe script now relies on
jq
to extract IDs, but there’s no pre-flight check. On machines withoutjq
,create_target
(and dashboard creation) will fail with hard-to-read errors.command -v jq >/dev/null 2>&1 || { log_error "jq is required but not installed – aborting"; return 1; }Insert this check once near the top (e.g. after the config section) to fail fast and inform users what they’re missing.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lock
is excluded by!**/*.lock
📒 Files selected for processing (9)
Cargo.toml
(1 hunks)resources/ingest_demo_data.sh
(1 hunks)src/handlers/http/cluster/mod.rs
(2 hunks)src/handlers/http/demo_data.rs
(1 hunks)src/handlers/http/ingest.rs
(2 hunks)src/handlers/http/mod.rs
(1 hunks)src/handlers/http/modal/ingest_server.rs
(1 hunks)src/handlers/http/modal/query_server.rs
(1 hunks)src/handlers/http/modal/server.rs
(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (8)
- Cargo.toml
- src/handlers/http/modal/ingest_server.rs
- src/handlers/http/modal/query_server.rs
- src/handlers/http/mod.rs
- src/handlers/http/ingest.rs
- src/handlers/http/modal/server.rs
- src/handlers/http/demo_data.rs
- src/handlers/http/cluster/mod.rs
🧰 Additional context used
🧠 Learnings (1)
resources/ingest_demo_data.sh (5)
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1370
File: resources/ingest_demo_data.sh:125-131
Timestamp: 2025-07-14T02:11:24.559Z
Learning: In the Parseable demo data script, the timestamp generation using `date -u +%Y-%m-%dT%H:%M:%S.%03dZ` works correctly and produces valid ISO timestamps with milliseconds, despite initial analysis suggesting %03d might not be a valid placeholder. The user verified this produces timestamps like "2025-07-13T16:53:15.013Z" in practice.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1346
File: src/parseable/streams.rs:351-355
Timestamp: 2025-06-16T05:20:18.593Z
Learning: In the Parseable codebase, arrow files are expected to always have valid creation or modified timestamps as a basic system assumption. The conversion flow uses expect() on file metadata operations to enforce this invariant with fail-fast behavior rather than graceful error handling, as violations represent fundamental system issues that should cause immediate failures.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1346
File: src/storage/object_storage.rs:838-840
Timestamp: 2025-06-16T02:09:49.200Z
Learning: In Parseable server, filenames for arrow files are generated server-side during ingestion using a strict format pattern: `{stream_hash}.date={date}.hour={hour}.minute={minute}.{custom_partitions}{hostname}.data.{extension}`. This guarantees that when splitting by dots, the second element will always be `date={value}`, making direct indexing safe when extracting the date part by splitting on '=' and accessing index 1.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1370
File: resources/ingest_demo_data.sh:12-14
Timestamp: 2025-07-14T11:29:22.509Z
Learning: In the Parseable codebase, the base64 encoding for Basic Auth headers in resources/ingest_demo_data.sh works correctly as implemented without needing to strip trailing newlines. The user has validated this works with their server environment.
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1305
File: src/handlers/http/users/dashboards.rs:125-148
Timestamp: 2025-05-01T10:33:51.767Z
Learning: When adding a tile to a dashboard in `add_tile()` function, the tile ID must be provided by the client and should not be generated by the server. If the tile ID is missing (nil), the API should fail the operation with an appropriate error message.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: coverage
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: Build Kafka aarch64-apple-darwin
API GET /api/v1/demodata?action=ingest
to ingest 10k records from the script
GET /api/v1/demodata?action=filters
to create saved sqls and saved filters from the script
GET /api/v1/demodata?action=alerts
to create a webhook target and 3 alerts
GET /api/v1/demodata?action=dashboards
to create a dashboard with 4 tiles - of timeseries, line, area and bar charts
Summary by CodeRabbit
Summary by CodeRabbit